-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RISCV] add load/store misched/PostRA subtarget features #149409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-risc-v Author: Daniel Henrique Barboza (danielhb) ChangesHi, This PR adds 4 new cluster options to fine tune how loads and stores are grouped during misched and postRA passes. The work was motivated by macrofusion tests with the veyron-v1 processor model. This model has stored pair macrofusions but it doesn't do much with load pairs. Adding new options to turn off load clustering improved the macrofusions amount and performance. The existing 2 cluster options added by @asb are kept because I'm afraid to break existing users that might be using them. If that' s not a concern we can remove them and keep the new options only. Full diff: https://github.com/llvm/llvm-project/pull/149409.diff 1 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index b43b915d0ad4f..24ddd3e1fd1c7 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -99,11 +99,31 @@ static cl::opt<bool> EnableMISchedLoadStoreClustering(
cl::desc("Enable load and store clustering in the machine scheduler"),
cl::init(true));
+static cl::opt<bool> EnableMISchedLoadClustering(
+ "riscv-misched-load-clustering", cl::Hidden,
+ cl::desc("Enable load clustering in the machine scheduler"),
+ cl::init(true));
+
+static cl::opt<bool> EnableMISchedStoreClustering(
+ "riscv-misched-store-clustering", cl::Hidden,
+ cl::desc("Enable store clustering in the machine scheduler"),
+ cl::init(true));
+
static cl::opt<bool> EnablePostMISchedLoadStoreClustering(
"riscv-postmisched-load-store-clustering", cl::Hidden,
cl::desc("Enable PostRA load and store clustering in the machine scheduler"),
cl::init(true));
+static cl::opt<bool> EnablePostMISchedLoadClustering(
+ "riscv-postmisched-load-clustering", cl::Hidden,
+ cl::desc("Enable PostRA load clustering in the machine scheduler"),
+ cl::init(true));
+
+static cl::opt<bool> EnablePostMISchedStoreClustering(
+ "riscv-postmisched-store-clustering", cl::Hidden,
+ cl::desc("Enable PostRA store clustering in the machine scheduler"),
+ cl::init(true));
+
static cl::opt<bool>
EnableVLOptimizer("riscv-enable-vl-optimizer",
cl::desc("Enable the RISC-V VL Optimizer pass"),
@@ -301,10 +321,13 @@ ScheduleDAGInstrs *
RISCVTargetMachine::createMachineScheduler(MachineSchedContext *C) const {
ScheduleDAGMILive *DAG = createSchedLive(C);
if (EnableMISchedLoadStoreClustering) {
- DAG->addMutation(createLoadClusterDAGMutation(
- DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
- DAG->addMutation(createStoreClusterDAGMutation(
- DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+ if (EnableMISchedLoadClustering)
+ DAG->addMutation(createLoadClusterDAGMutation(
+ DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+
+ if (EnableMISchedStoreClustering)
+ DAG->addMutation(createStoreClusterDAGMutation(
+ DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
}
const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
@@ -318,10 +341,13 @@ ScheduleDAGInstrs *
RISCVTargetMachine::createPostMachineScheduler(MachineSchedContext *C) const {
ScheduleDAGMI *DAG = createSchedPostRA(C);
if (EnablePostMISchedLoadStoreClustering) {
- DAG->addMutation(createLoadClusterDAGMutation(
- DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
- DAG->addMutation(createStoreClusterDAGMutation(
- DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+ if (EnablePostMISchedLoadClustering)
+ DAG->addMutation(createLoadClusterDAGMutation(
+ DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+
+ if (EnablePostMISchedStoreClustering)
+ DAG->addMutation(createStoreClusterDAGMutation(
+ DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
}
return DAG;
|
@@ -301,10 +321,13 @@ ScheduleDAGInstrs * | |||
RISCVTargetMachine::createMachineScheduler(MachineSchedContext *C) const { | |||
ScheduleDAGMILive *DAG = createSchedLive(C); | |||
if (EnableMISchedLoadStoreClustering) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even though the old option is preserved, it now does nothing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All added options are defaulted to 'true', similar to the 2 existing options. But the older options have precedence. In the commit msg I gave an example:
"To preserve the existing behavior of the existing flags all options added
are default 'true'. Note that the two existing flags have precedence, so
setting "riscv-misched-load-store-clustering=false" will disable load
mutations even if setting riscv-misched-load-clustering=true. Same thing
with stores and with the postRA options."
So riscv-misched-load-store-clustering=false will turn off misched clustering for both loads and stores, regardless of riscv-misched-(load/store)-clustering being set to 'true' in the same command line.
Perhaps a better wording would be "disabling takes precedence" because that's how the code works, e.g.:
- riscv-misched-load-store-clustering=false will always disable both load and store clustering for misched;
- riscv-misched-load-clustering=false will always disable load clustering for misched;
- riscv-misched-store-clustering=false will always disable store clustering for misched.
I thought about inverting the semantics of the flag, i.e. adding options that would disable steps instead of enabling, but that didn't make the situation much better. What would make it better is to remove the existing 2 options, but then we'll have to deal with potential broken scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but let's wait to see what others say if it is OK to remove the old options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally in favor of removing the existing two options (i.e. riscv-misched-load-store-clustering
and riscv-postmisched-load-store-clustering
). This is not a Clang flag and LLVM flags change all the time.
Non blocking, but an alternative solution to avoid having two flags could be making -riscv-misched-load-store-clustering
(and its post-RA counterpart) to take a list of "load" or "store" instead to enable either or both clusterings.
1576ca9
to
9de5329
Compare
Thanks for the reviews. Removing the original flags is a cleaner way of going about it so I just did that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w/ minor comments
Please avoid force push unless necessary: https://llvm.org/docs/GitHub.html#rebasing-pull-requests-and-force-pushes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add new tests checking the situation where only either of the flags is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add two new tests disabling load and store clustering respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
static cl::opt<bool> EnableMISchedLoadStoreClustering( | ||
"riscv-misched-load-store-clustering", cl::Hidden, | ||
cl::desc("Enable load and store clustering in the machine scheduler"), | ||
static cl::opt<bool> EnableMISchedLoadClustering( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these come from subtarget instead of command line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I miss something? We don't have subtarget features for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my point. Should we add Subtarget features instead? The description of the PR suggests that whether this is profitable is CPU specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, make sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By Subtarget feature are we talking about a a feature, like FeatureUnalignedScalarMem, or a tune like TunePostRAScheduler (took the examples from sifive_p670)? Seems to me that this would be more like a tuning like PostRAScheduler.
Also, do we want to keep the default as is (enable all load/store clustering in both stages) and add features to disable it, or the other way around? Having everything disabled by default and adding subtarget features to enable what you want is cleaner, but we'll change behavior for everyone that doesn't add the new features in their target defs. Perhaps this is a good thing (we'll force people to make a decision instead of relying on defaults) ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked how TunePostRAScheduler is implemented and it has both a subtarget feature and also a flag (post-RA-scheduler) that can be used to override the target definition. I like this design because I give the flexibility of the command line (for debugging and whatnot) while also giving each processor a convenient way of setting clustering preferences.
If no one opposes I'll go this direction, but I guess we'll want to turn the default to 'false' and let each processor define what clustering they want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default settings of load/store clustering breaks 290+ tests in check-llvm. This shows that there's a LOT of code and logic built on top of the existing default. I don't want to touch this viper nest in this work so I'll keep the default as is. This means that the subtarget features will disable the default settings instead of enabling - TuneNoDefaultUnroll would be an example.
Also, after adding both the command line options and subtarget features I realized that I might be complicating things too much, given that we can override the subtarget features via "mattr=-" anyway, so I'll change the code to send just the subtarget features. I'll send a new version shortly.
@@ -1700,6 +1700,18 @@ def TuneNLogNVRGather | |||
def TunePostRAScheduler : SubtargetFeature<"use-postra-scheduler", | |||
"UsePostRAScheduler", "true", "Schedule again after register allocation">; | |||
|
|||
def TuneDisableMISchedLoadClustering : SubtargetFeature<"disable-misched-load-clustering", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @topperc meant to mimic something like "bool RISCVSubtarget::useAA()" for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I meant something we could set per-CPU like this.
@@ -1700,6 +1700,18 @@ def TuneNLogNVRGather | |||
def TunePostRAScheduler : SubtargetFeature<"use-postra-scheduler", | |||
"UsePostRAScheduler", "true", "Schedule again after register allocation">; | |||
|
|||
def TuneDisableMISchedLoadClustering : SubtargetFeature<"disable-misched-load-clustering", | |||
"NoMISchedLoadClustering", "true", "Disable load clustering in the machine scheduler">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"NoMISchedLoadClustering", "true", "Disable load clustering in the machine scheduler">; | |
"MISchedLoadClustering", "false", "Disable load clustering in the machine scheduler">; |
Negate the enum and the value. See also TuneNoDefaultUnroll
and TuneNoSinkSplatOperands
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add Enable
before the enum, like EnableMISchedLoadClustering
. So that the auto-generated methods will be like enableXXX()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That explains why I didn't find an enableDefaultUnroll() method ...
@@ -150,6 +150,14 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo { | |||
|
|||
bool enablePostRAScheduler() const override { return UsePostRAScheduler; } | |||
|
|||
bool enableMISchedLoadClustering() const { return MISchedLoadClustering; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need these because there are auto-generated methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ping Do we require more reviews or is this good to go? Also, I don't have write access (this is my first PR for LLVM) so I'm afraid someone will have to do the honors merging this one. I'm not sure if a rebase is needed due to the conflict being pointed out too ... |
Updating the branch to fix conflicts |
8561859
to
170d0ae
Compare
You have enough reviews. The reviewers probably didn't think about you not having write permission. |
Some processors benefit more from store clustering than load clustering, and vice-versa, depending on factors that are exclusive to each one (e.g. macrofusions implemented). Likewise, certain optimizations benefits more from misched clustering than postRA clustering. Macrofusions are again an example: in a processor with store pair macrofusions, like the veyron-v1, it is observed that misched clustering increases the amount of macrofusions more than postRA clustering. This of course isn't necessarily true for other processors, but it shows that processors can benefit from a more fine grained control of clustering mutations, and each one is able to do it differently. Add 4 new clustering options that deprecates the existing riscv-misched-load-store-clustering and riscv-postmisched-load-store-clustering options: - riscv-misched-load-clustering and riscv-misched-store-clustering: enable/disable load/store clustering during misched; - riscv-postmisched-load-clustering and riscv-postmisched-store-clustering: enable/disable load/store clustering during PostRA. To preserve the existing clustering behavior all 4 options are defaulted to 'true'.
- add store-clustering=false to the existing LDCLUSTER test; - add a new STCLUSTER test with load-clustering=false; - add a new DEFAULTCLUSTER test with default clustering options.
Added 4 subtarget features that disables the default cluster settings: - TuneDisableMISched(Load|Store)Clustering - TuneDisablePostMISched(Load|Store)Clustering The 4 command line options added previously were removed. Use the new subtarget features in Ventana veyron-v1 to allow only misched store clustering. Test changes: - add a store clustering test: misched-store-clustering.ll - add subtarget features tests in misched-load-clustering.ll and misched-store-clustering.ll - misched-mem-clustering.mir: use the subtarget features
Instead of using NoMISchedLoadClustering and setting it to 'true' using the feature, rename it to NoMISchedLoadClustering and make the feature set it to 'false'. Rename the subtarget method disableMISchedLoadClustering() to enableMISchedLoadClustering() to remove the negative check in createMachineScheduler() when enabling the clustering. Do the same with the other 3 subtarget features.
We can get the existing enableXXClustering methods to be autogenerated by prefixing the enums with 'Enable', e.g. 'EnableMISchedLoadClustering', and an autogenerated 'enableMISchedLoadClustering' method will be created.
170d0ae
to
a743f80
Compare
Got it. I asked @mgudim for help. |
@danielhb Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
I just merged it on your behalf |
Some processors benefit more from store clustering than load clustering,
and vice-versa, depending on factors that are exclusive to each one
(e.g. macrofusions implemented).
Likewise, certain optimizations benefits more from misched clustering
than postRA clustering. Macrofusions are again an example: in a
processor with store pair macrofusions, like the veyron-v1, it is
observed that misched clustering increases the amount of macrofusions
more than postRA clustering. This of course isn't necessarily true for
other processors, but it shows that processors can benefit from a more
fine grained control of clustering mutations, and each one is able to do
it differently.
Add 4 new subtarget features that deprecates the existing
riscv-misched-load-store-clustering and riscv-postmisched-load-store-clustering
options:
disable-misched-load-clustering and disable-misched-store-clustering:
disable load/store clustering during misched;
disable-postmisched-load-clustering and disable-postmisched-store-clustering:
disable load/store clustering during PostRA.
Note that the new subtarget features disables specific stages of the default
clustering settings. The default per se (load and store clustering for both
misched and PostRA) is left untouched.
Disable all clustering but misched-store-clustering for the veyron-v1 processor
using the new features.